Consumer credit risk: Individual probability estimates using machine learning
نویسندگان
چکیده
Consumer credit scoring is often considered a classification task where clients receive either a good or a bad credit status. Default probabilities provide more detailed information about the creditworthiness of consumers, and they are usually estimated by logistic regression. Here, we present a general framework for estimating individual consumer credit risks by use of machine learning methods. Since a probability is an expected value, all nonparametric regression approaches which are consistent for the mean are consistent for the probability estimation problem. Among others, random forests (RF), k-nearest neighbors (kNN), and bagged k-nearest neighbors (bNN) belong to this class of consistent nonparametric regression approaches. We apply the machine learning methods and an optimized logistic regression to a large dataset of complete payment histories of short-termed installment credits. We demonstrate probability estimation in Random Jungle, an RF package written in C++ with a generalized framework for fast tree growing, probability estimation, and classification. We also describe an algorithm for tuning the terminal node size for probability estimation. We demonstrate that regression RF outperforms the optimized logistic regression model, kNN, and bNN on the test data of the short-term installment credits. 2013 Elsevier Ltd. All rights reserved.
منابع مشابه
Consumer credit-risk models via machine-learning algorithms
We apply machine-learning techniques to construct nonlinear nonparametric forecasting models of consumer credit risk. By combining customer transactions and credit bureau data from January 2005 to April 2009 for a sample of a major commercial bank’s customers, we are able to construct out-of-sample forecasts that significantly improve the classification rates of credit-card-holder delinquencies...
متن کاملAn Evaluation of Support Vector Machines in Consumer Credit Analysis
This thesis examines a support vector machine approach for determining consumer credit. The support vector machine using a radial basis function (RBF) kernel is compared to a previous implementation of a decision tree machine learning model. The dataset used for evaluation was provided by a large bank and includes relevant consumer-level data, including transactions and credit-bureau data. The ...
متن کاملPaper 1323-2017: Real AdaBoost: Boosting for Credit Scorecards and Similarity to WOE Logistic Regression
Adaboost is a machine learning algorithm that builds a series of small decision trees, adapting each tree to predict difficult cases missed by the previous trees and combining all trees into a single model. We will discuss the AdaBoost methodology and introduce the extension called Real AdaBoost. Real AdaBoost comes from a strong academic pedigree: its authors are pioneers of machine learning a...
متن کاملModelling the credit risk for portfolios of consumer loans: Analogies with corporate loan models
The Internal Ratings Based (IRB) approach suggested in the New Basel Accord regulations (BIS 2005) uses a capital allocation formula derived from a Merton style structural model of the credit risk of portfolios of corporate loans. Yet this formula is being applied in the case of consumer loans as well as corporate loans. This has highlighted that although there are a number of well established ...
متن کاملData mining with Support Vector Machine
Machine Learning is considered as a subfield of Artificial Intelligence and it is concerned with the development of techniques and methods which enable the computer to learn. In this paper introduce SVM. It is techniques and methodologies developed for machine learning tasks Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. S...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Expert Syst. Appl.
دوره 40 شماره
صفحات -
تاریخ انتشار 2013